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"VIDEO ENCODING AND DECODING METHODS AND CORRESPONDING 
DEVICES" 

FIELD OF THE INVENTION 

The present invention generally relates to the field of video compression 
and, for instance, more particularly to the video standards of the MPEG femily 
(MPEG-1, MPEG-2, MPEG-4) and to the video coding recommendations of the ITU 
H26X family (H.261, H.263 and extensions). More specifically, the invention relates 
to a video encoding method appUed to an input sequence of fi-ames in which each 
frame is subdivided into blocks of arbitrary size, said method comprising for at least a 
part of said blocks of the current frame the steps of : 

- generating on a block basis motion-compensated frames, each one 
being obtained from each current original firame and a previous reconstructed frame ; 

- generating from said motion-compensated firames residual signals ; 

- using a so-called matching pursuit (MP) algorithm for decomposing 
each of said generated residual signals kito coded dictionary frmctions called atoms, 
the other blocks of the current frame being processed by means of other coding 
techniques ; 

- coding said atoms and the motion vectors determined during the motion 
compensation step, for generating an output coded bitstream. 

The invention also relates to a corresponding video decoding method and 
to the encoding and decoding devices for carrying out said encoding and decoding 
methods. 

BACKGROUND OF THE INVENTION 

In the current video standards (up to the video coding MPEG-4 standard 
and H.264 recommendation), the video, described in terms of one luminance channel 
and two chrominance ones, can be compressed thanks to two coding modes appUed to 
each channel : the "intra" mode, exploiting in a given channel the spatial redundancy 
of the pixels (picture elements) within each image, and the "inter" mode, exploiting the 
temporal redundancy between separate images (or firames). The inter mode, relying on 
a motion compensation operation, allows to describe an image from one (or more) 
previously decoded image(s) by encoding the motion of the pixels from one (or more) 
image(s) to another one. Usually, the current image to be coded is partitioned into 
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independent blocks (for instance, of size 8x8 or 16x16 pixels in MPEG-4, or of size 
4 X 4, 4 X 8, 8 X 4, 8 X 8, 8 X 16, 16 X 8 and 16 X 16 in H.264), each of them being 
assigned a motion vector (the three channels share such a motion description). A 
prediction of said image can then be constracted by displacing pixel blocks from a 
5 reference image according to the set of motion vectors associated to each block. 

Finally, the difference, or residual signal, between the current image to be encoded and 
its motion-compensated prediction can be encoded in the intra mode (with 8x8 
discrete cosine transforms - or DCTs - for MPEG-4, or 4 x 4 DCTs for H.264 in the 
main level profile). 

1 0 The DCT is probably the most widely used transform, because it offers a 

good compression efficiency in a wide variety of coding situations, especially at 
medium and high bitrates. However, at low bitrates, the hybrid motion compensated 
DCT structure may be not able to deUver an artefact-firee sequence for two reasons. 
First, the structure of the motion-compensated inter prediction grid becomes visible, 

15 with blocking artifacts. Moreover, the block edges of the DCT basis functions become 

visible in the image grid, because too few coefficients are quantized - and too coarsely 
- to make up for these blocking artifacts and to reconstruct smooth objects in the 
image. 

The document "Very low bit-rate video coding based on matching 
20 pursuits", R.Neff and A. Zakhor, IEEE Transactions on Circuits and Systems for 

Video Technology, vol.7, n°l, Febraary 1997, pp.158-171, describes a new motion- 
compensated system including a video compression algorithm based on tiie so-called 
matching pursuit (MP) algorithm, a technique developed about ten years ago (see the 
document "Matching pursuits with time-frequency dictionaries", S.CMallat and 
25 Z.Zhang, IEEE Transactions on Signal Processing, vol.41, n''12, December 1993, 

pp.3397-3414). Said technique provides a way to iteratively decompose any ftmction 
or signal (for example, image, video,. . .) into a linear expansion of wavefomis 
belonging to a redimdant dictionary of basis functions, well localized both in time and 
frequency and called atoms. A general family of time-frequency atoms can be created 
30 by scaling, translating and modulating a single fimction g(t) G L^(R) supposed to be 

real and continuously differentiable. These dictionary fimctions maybe designated 
by: 

gyiOe G (G = dictionary set), (1) 
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7 (= gamma) being an indexing parameter associated to each particular dictionary 
element (or atom). As described in the first cited document, assuming that the 
functions g^, (t) have unit norm, i.e. < (t) , (/) > = 1, the decomposition of a one- 
dimensional time signal f(t) begins by choosing 7 to maximize the absolute value of 
5 the following inner product : 

p = <f(t), ^,(0>, (2) 
where p is called an expansion coefficient for the signal f(t) onto the dictionary 
function (0 • A residual signal R is then computed : 

R(t) = f(t) - p . g,it) (3) 

10 and this residual signal is expanded in the same way as the original signal f(t). An 

atom is, in fact, the name given to each pair pk, where k is the rank of the iteration 
in tiie matching pursuit procedure. After a total of M stages of this iterative procedure 
(where each stage n yields a dictionary structure specified by 7n, an expansion 
coefficient pn and a residual Rn which is passed on to the next stage), the original 

1 5 signal f(t) can be approximated by a signal f (t) which is a linear combination of the 

dictionary elements thus obtained. The iterative procedure is stopped when a 
predefined condition is met, for example either a set number of expansion coefficients 
is generated or some energy threshold for the residual is reached. 

In the first document mentioned above, describiag a system based on said 

20 MP algorithm and which performs better than the DCT ones at low bitrates, original 

images are first motion-compensated, using a tool called overlapped block-motion 
compensation which avoids or reduces blocking artifacts by blending the boundaries 
of predicted/displaced blocks (the edges of the blocks are therefore smoothed and the 
block grid is less visible). After the motion prediction image is formed, it is subtracted 

25 firom the original one, in order to produce the motion residual. Said residual is then 

coded, using the MP algorithm extended to the discrete two-dimensional (2D) domain, 
with a proper choice of a basis dictionary (said dictionary consists of an overcomplete 
collection of 2D separable Gabor functions g, shown in Fig.l). 

A residual signal f is then reconstructed by means of a linear combination 

30 of M dictionary elements : 

f=SPn-g.. (4) 
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If the dictionary basis functions have unit nonn, Pn is the quantized inner product <, > 
between the basis function gpn and the residual updated iteratively, that is to say : 

kain-l 

Pn=<f-i;Pic-g,,,g,„> 



(5) 



the pairs ( Pn , 7n) being the atoms. In the work described by the authors of the 
document, no restriction is placed on the possible location of an atom in an image (see 
Fig.2). The 2D Gabor functions forming the dictionary set are defined in temis of a 
prototype Gaxissian wmdow : 

w(t)=^e-^ (6) 

A monodimensional (ID) discrete Gabor function is defined as a scaled, modulated 
Gaussian window : 

Z' XT ^ Z' XT N 



ga(i) = K5.W 



. N ^ 

1 + 1 

2 



^ N 
2;r^(i-— + 1) 



cos 



N 



+ 0 



(7) 



20 



with: i e {0, 1,...,N-1}. 
The constant is chosen so that (i) is of unit norm, and a = (s, ^ , ^) is a triple 
consisting, respectively, of a positive scale, a modulation firequency, and a phase shift. 
If S is the set of all such triples 5 , then the 2D separable Gabor functions of the 
dictionary have the following fomi : 

G5,5(y) = gg(i)g^(j) for iJe{0,l,...,N-l},anda,)ff eS (8) 

The set of available dictionary triples and associate sizes (in pixels) indicated in the 
docxmient as forming the ID basis set (or dictionary) is shown in the following 
table 1 : 



25 



wo 2005/013201 PCT/IB2004/002476 

5 

Table 1 



k 


Sjc 






size 










(pixels) 


0 


1.0 


0.0 


0 


1 


1 


3.0 


0.0 


0 


5 


2 


5.0 


0.0 


0 


9 


3 


7.0 


0.0 


0 


11 


4 


9.0 


0.0 


0 


15 


5 


12.0 


0.0 


0 


21 


6 


14.0 


0.0 


0 


23 


7 


17.0 


0.0 


0 


29 


8 


20.0 


0.0 


0 


35 


9 


1.4 


1.0 


nil 


3 


10 


5.0 


1.0 


nil 


9 


11 


12.0 


1.0 


7t/2 


21 


12 


16.0 


1.0 


nil 


27 


13 


20.0 


1.0 


nl2 


35 


14 


4.0 


2.0 


0 


7 


15 


4.0 


3.0 


0 


7 


16 


8.0 


3.0 


0 


13 


17 


4.0 


4.0 


0 


7 


18 


4.0 


2.0 


;r/4 


7 


19 


4.0 


4.0 


;r/4 
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To obtain this parameter set, a training set of motion residual images was decomposed 
using a dictionary derived from a much larger set of parameter triples. The dictionary 
5 elements which were most often matched to the training images were retained in the 

reduced set. The obtained dictionary was specifically designed so that atoms can freely 
match the structure of motion residual image when their influence is not conjBned to 
the boundaries of tiie block they lie in (see Fig.2, showing the example of an atom 
placed in a block-divided image without block-restrictions). 
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However, the approach described in the cited docimient sxiffers from several 
limitations. The first one is related to the continuous structure of the Gabor dictionary. 
Because atoms can be placed at all pixel locations without any restriction and 
therefore span several motion-compensated blocks, the MP algorithm cannot represent 
5 blocking artefacts in the residual signal with a limited number of smooth atoms. It is 

the reason why it is necessary to have some kind of overlapped motion estimation, in 
order to limit the blocking artifacts. If a classical block-based motion compensation 
(i.e. without overlapping windows) is used, the smooth basis functions may not be 
appropriate to make up for blocking artifacts (indeed, it has been recently showed that 

10 coding gains could be made when the size of the residual coding transform is matched 

to the size of the motion-compensated block). Third, it is difficult to combine intra and 
inter blocks in a coded frame (in the cited document, no DCT intra macroblock exists, 
probably in order to avoid discontinuities on the boundaries of blocks coded in intra 
and inter mode that would be badly modelled by the smooth structure of Gabor basis 

15 ftmctions). 

SUMMARY OF THE INVENTION 

It is therefore an object of the invention to propose a video encoding method in 
which these limitations no longer exist. 

To this end, the invention relates to a video encoding method such as defined in 
20 the introductory part of the description and which is moreover such that, when using 

said MP algorithm, any atom acts only on one block 5 at a time, said block-restriction 
leading to the fact that the reconstruction of a residual signal f is obtained from a 
dictionary that is composed of basis functions g^,^ |^ restricted to the block B 

corresponding to the indexing parameter y„ , according to the following 2D spatial 
25 domain operation : 

SrX^U) = gr„ (iJ) if pixel (ij) € B 

^r„L('»^) = 0 otherwise (i.e. {iJ)^B) . 

The main interest of this approach resides in the fact that the MP atoms are 
restricted to the motion-compensated blocks. It allows to better model the blocky 
30 stracture of residual signals, impUcitly augments the dictionary diversity for the same 

coding cost and offers the possibility of altemating MP and DCT transforms since 
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there is no interference across block boundaries. It also avoids the need to resort to 
overlapped motion compensation to limit blocking artefacts. 

It is another object of the invention to propose a video encoding device 
allowing to carry out said encoding method. 
5 It is still an object of the invention to propose video decoding method and 

device allowing to decode signals coded by means of said video encoding method and 
device. 

BRIEF DESCRIPTION OF THE DRAWINGS 

The present invention will now be described, by way of example, with reference 
10 to tiie accompanying drawing in which : 

- Fig.l allows a visualization of the 400 basis functions of the 2D Gabor 
dictionary used in the implementation of the matching pursuit algorithm ; 

- Fig.2 illustrates the example of an atom placed in a block-divided image 
without block-restrictions ; 

15 - Fig.3 illustrates an example of hybrid video coder according to the 

invention ; 

- Fig,4 shows an example of a video encoding device for implementing a MP 
pursuit algorithm ; 

- Fig.5 illustrates the case of a block-restricted matching pursuit residual coding, 
20 with an atom being confined into the motion-compensated grid and acting only on a 

block at a time ; 

- Fig.6 illustrates an example of hybrid video decoder according to the 
invention ; 

- Fig.7 shows an example of a video decoding device implementing the MP 
25 algoritinn. 

DETAILED DESCRIPTION OF THE INVENTION 

A simplified block diagram of a video encoding device implementing a hybrid 
video coder using multiple coding raigines is shown in Fig.3. Several coding engines 
implement predetermined coding techniques, for instance a coding engine 31 can 
30 implement the INTRA-DCT coding method, a second one 32 the INTER-DCT coding 

method, and a third one 33 the matching pursuit algorithm. Each fi-ame of the input 
video sequence is received ("video signal") by a block partitioner device 34, which 
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partitions the image into individual blocks of varying size, and decides which coding 
engine will process the cuirent original block. The decisions representing the block 
position, its size and the selected coding engine is then inserted into the bitstream by a 
coding device 35. The current original signal block is then transferred to the selected 
5 coding engine (the engine 33 in the situation illustrated in Fig.3). 

A matching pursuit coding engine will be further illustrated as a simplified block 
diagram in Fig-4 showing an example of a video encoding device for implementing an 
MP pursuit algorithm. Each of the original signal blocks of the input video sequence 
assigned to the coding mgine 33 is received on one side by motion compensating 

10 means 41 for determining motion vectors (said motion vectors are conventionally 

foimd using the block matching algorithm), and the vectors thus obtained are coded by 
motion vector coding means 42, the coded vectors being delivered to a multiplexer 43 
(referenced, but not shown). On the other side, a subtracter 44 delivers on its output 
the residual signal between the current image and its prediction. Said residual signal is 

15 then decomposed into atoms (the dictionary of atoms is referenced 47) and the atom 

parameters thus determined (module 45) are coded (module 46). The coded motion 
vectors and atom parameters then form a bitstream that is sent to match a predefined 
condition for each firame of the sequence. 

This encoding engine 33 carries out a method of coding an input bitstream that 

20 comprises the following steps. First, as in most coding structures, the original firames 

of the input sequence are motion-compensated (each one is motion-compensated on 
the basis of the previous reconstructed firame, and the motion vectors determined 
during said motion-compensated step are stored in view of their later transmission). 
Residual signals are then generated by difference between the current frame and the 

25 associated motion-compensated prediction. Each of said residual signals is then 

compared with a dictionary of fimctions consisting of a collection of 2D separable 
Gabor fimctions, in order to generate a dictionary structure g^, (t) specified by the 

indexing parameter an expansion coefficient p(n) and a residual Rn(t) - p. g^, (t) 
which is passed on to the next stage of this iterative procedure. Once the atom 
30 parameters are found, they can be coded (togeflier with the motion vectors previously 

determined), the coded signals thus obtained forming the bitstream sent to the decoder. 

The technical solution proposed according to the invention consists in confining 
the influence of atoms to the boundaries of the block they lie in. This block-restriction 
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means that an atom acts only on one block at a time, confined into the motion- 
compensation grid, as illustrated in Fig.5. This block-restriction modifies the signal 
matching pursuit algorithm in the following manner. 

If one assume that it is wanted to obtain tiie MP decomposition of the 2D 
residual in a block B of size M x N pixels after motion-compensation, and if one 
denotes the MP dictionary restricted to B, the elraients g^^ |^ of said dictionary are 
obtained by means of the relationships (9) and (10) : 



The interest of this approach resides in the fact that because a single atom cannot span 
several blocks, it does not have to deal with the higih-firequency discontinuities at block 
edges. Instead, it can be adapted to block boundaries, and even to block sizes, by 
designing block-size dependent dictionaries. Moreover, since overlapped motion 
compensation is no longer mandatory to preserve the MP eflBciency, classical motion 
compensation may be used. 

The preferred anbodiment of encoding device described above sends a 
bitstream which is received by a corresponding decoding device. A simplified block 
diagram of a video decoding device according to the invention and implementing a 
hybrid video decoder using multiple decoding engines is shown ia Fig.6. The 
transmitted bitstream is received on one side by a block partition decoding device 64, 
which decodes the current block position, its size, and the decoding method. Given the 
decoding method, the bitstream elements are then transferred to the corresponding 
decoding engine, 61 or 62 or 63 in the case of Fig.6, which will in tum decode tiie 
assigned blocks and output the video signal reconstructed block. The available 
decoding engines can be for instance an INTRA-DCT block decoder 61, an INTER- 
DCT block decoder 62, and a matching pursuit block decoder 63. 



gr„ l(i>J) = gr„ ihj) if pixel {ij) e B (9) 
Sr„l^U) = 0 otherwise (i.e. (z,j)^5) (10) 



In this case, since g^^ |^ does not necessarily have a unit norm, pn needs to be 
reweigjited as : 
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An example of matching pursuit decoding engine is further illustrated in Fig. 7 
showing an example of a video decoding device implementing the MP algorithm. The 
bitstream elements are received by an entropy decoder device 71, which forwards the 
decoded atom parameters to an atom device 72 (the dictionary of atoms is referenced 
5 73) which reconstructs the matching pursuit functions at the decoded position within 

the assigned video block to form the decoded residual signal. The entropy decoder 
device also output motion vectors which are fed into a motion compensation device 74 
to form a motion prediction signal from previously reconstructed video signals. The 
motion prediction and the reconstructed residual signal are then summed in an adder 
10 75 to produce a video signal reconstracted block. 



